Exploiting Pipelined Executions in OpenMP
نویسندگان
چکیده
This paper proposes a set of extensions to the OpenMP programming model to express point–to–point synchronization schemes. This is accomplished by defining, in the form of directives, precedence relations among the tasks that are originated from OpenMP work–sharing constructs. The proposal is based on the definition of a name space that identifies the work parceled out by these work–sharing constructs. Then the programmer defines the precedence relations using this name space. This relieves the programmer from the burden of defining complex synchronization data structures and the insertion of explicit synchronization actions in the program that make the program difficult to understand and maintain. The paper briefly describes the main aspects of the runtime implementation required to support precedences relations in OpenMP. The paper focuses on the evaluation of the proposal through its use two benchmarks: NAS LU and ASCI Seep3d.
منابع مشابه
Shared Memory Pipelined Parareal
The paper introduces an OpenMP implementation of pipelined Parareal and compares it to a standard MPI-based implementation. Both versions yield essentially identical runtimes, but, depending on the compiler, the OpenMP variant consumes about 7% less energy. However, its key advantage is a significantly smaller memory footprint. The higher implementation complexity, including manual control of l...
متن کاملAccurate and Complete Hardware Profiling for OpenMP - Multiplexing Hardware Events Across Executions
Analyzing the behavior of OpenMP programs and their interaction with the hardware is essential for locating performance bottlenecks and identifying performance optimization opportunities. However, current architectures only provide a small number of dedicated registers to quantify hardware events, which strongly limits the scope of performance analyses. Hardware event multiplexing can help cove...
متن کاملThe Feasibility of Using OpenCL Instead of OpenMP for Parallel CPU Programming
OpenCL, along with CUDA, is one of the main tools used to program GPGPUs. However, it allows running the same code on multi-core CPUs too, making it a rival for the long-established OpenMP. In this paper we compare OpenCL and OpenMP when developing and running compute-heavy code on a CPU. Both ease of programming and performance aspects are considered. Since, unlike a GPU, no memory copy operat...
متن کاملA Pipelined Execution of Tiled Nested Loops on SMPs with Computation and Communication Overlapping
This paper proposes a novel approach for the parallel execution of tiled Iteration Spaces onto a cluster of SMP PC nodes. Each SMP node has multiple CPUs and a single memory mapped PCI-SCI Network Interface Card. We apply a hyperplane-based grouping transformation to the tiled space, so as to group together independent neighboring tiles and assign them to the same SMP node. In this way, intrano...
متن کاملA pipelined schedule to minimize completion time for loop tiling with computation and communication overlapping
This paper proposes a new method for the problem of minimizing the execution time of nested for-loops using a tiling transformation. In our approach, we are interested not only in tile size and shape according to the required communication to computation ratio, but also in overall completion time. We select a time hyperplane to execute different tiles much more efficiently by exploiting the inh...
متن کامل